Search CORE

302 research outputs found

Efficient known ncRNA search including pseudoknots

Author: Cheng Yuan
Yanni Sun
Publication venue: Springer Nature
Publication date: 21/01/2013
Field of study

BACKGROUND: Searching for members of characterized ncRNA families containing pseudoknots is an important component of genome-scale ncRNA annotation. However, the state-of-the-art known ncRNA search is based on context-free grammar (CFG), which cannot effectively model pseudoknots. Thus, existing CFG-based ncRNA identification tools usually ignore pseudoknots during search. As a result, dozens of sequences that do not contain the native pseudoknots are reported by these tools. When pseudoknot structures are vital to the functions of the ncRNAs, these sequences may not be true members. RESULTS: In this work, we design a pseudoknot search tool using multiple simple sub-structures, which are derived from knot-free and bifurcation-free structural motifs in the underlying family. We test our tool on a contiguous 22-Mb region of the Maize Genome. The experimental results show that our work competes favorably with other pseudoknot search methods. CONCLUSIONS: Our sub-structure based tool can conduct genome-scale pseudoknot-containing ncRNA search effectively and efficiently. It provides a complementary pseudoknot search tool to Infernal. The source codes are available at http://www.cse.msu.edu/~chengy/knotsearch

Springer - Publisher Connector

PubMed Central

Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning

Author: Shang Jiayu
Sun Yanni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2021
Field of study

Background: Prokaryotic viruses, which infect bacteria and archaea, are the most abundant and diverse biological entities in the biosphere. To understand their regulatory roles in various ecosystems and to harness the potential of bacteriophages for use in therapy, more knowledge of viral-host relationships is required. High-throughput sequencing and its application to the microbiome have offered new opportunities for computational approaches for predicting which hosts particular viruses can infect. However, there are two main challenges for computational host prediction. First, the empirically known virus-host relationships are very limited. Second, although sequence similarity between viruses and their prokaryote hosts have been used as a major feature for host prediction, the alignment is either missing or ambiguous in many cases. Thus, there is still a need to improve the accuracy of host prediction. Results: In this work, we present a semi-supervised learning model, named HostG, to conduct host prediction for novel viruses. We construct a knowledge graph by utilizing both virus-virus protein similarity and virus-host DNA sequence similarity. Then graph convolutional network (GCN) is adopted to exploit viruses with or without known hosts in training to enhance the learning ability. During the GCN training, we minimize the expected calibrated error (ECE) to ensure the confidence of the predictions. We tested HostG on both simulated and real sequencing data and compared its performance with other state-of-the-art methods specifcally designed for virus host classification (VHM-net, WIsH, PHP, HoPhage, RaFAH, vHULK, and VPF-Class). Conclusion: HostG outperforms other popular methods, demonstrating the efficacy of using a GCN-based semi-supervised learning approach. A particular advantage of HostG is its ability to predict hosts from new taxa.Comment: 16 pages, 14 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Choosing the best heuristic for seeded alignment of DNA sequences

Author: Buhler Jeremy
Sun Yanni
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. RESULTS: We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds), and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. CONCLUSION: Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Designing seeds for similarity search in genomic DNA

Author: Buhler Jeremy
Keich Uri
Sun Yanni
Publication venue: Published by Elsevier Inc.
Publication date: 31/05/2005
Field of study

AbstractLarge-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or “seed’’ of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging.This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good choices. Finally, we describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice

Elsevier - Publisher Connector

Comparison of characteristics and mortality in multidrug resistant (MDR) and non-MDR tuberculosis patients in China

Author: Harley David
Sleigh Adrian
Sun Yanni
Vally Hassan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

BACKGROUND: We conducted a cohort study to compare the characteristics of MDR-TB with non-MDR-TB patients and to measure long term (9-year) mortality rate and determine factors associated with death in China. METHODS: We reviewed the medical records of 250 TB cases from a 2001 survey to compare 100 MDR-TB patients with 150 non-MDR-TB patients who were treated in 2001-2002. Baseline attributes extracted from the records were compared between the two cohorts and long-term mortality and risk factors were determined at nine-year follow-up in 2010. RESULTS: Among the 234 patients successfully followed up, 63 (26.9%) were female and 171 (73.1 %) were male. MDR-TB patients had poorer socioeconomic status compared to non-MDRTB. Nine years after the diagnosis of TB, 69 or 29.5 % of the 234 patients had died (32 or 21.6 % of non-MDR-TB versus 37 or 43.0 % of MDR-TB) and the overall mortality rate was 39/1000 per year (PY) (27/1000 PY among non-MDR versus 63/1000 PY among MDR-TB). Factors associated with death included: MDR status (hazard ratio (HR): 1.86; CI: 1.09-3.13), limited education of primary school or lower (HR: 2.51; CI 1.34-4.70) and received TB treatment during the nine-year period (HR 1.82; 95 % CI 1.02-3.26). CONCLUSIONS: MDR-TB was a strong predictor for poor long-term outcome. High quality diagnosis and treatment must be ensured. Greater reimbursement or free treatment may be needed to provide access for the poor and vulnerable populations, and to increase treatment compliance.Funding for the study was provided by the National Centre for Epidemiology and Population Health, Australian National University, as the PhD study project for Yanni Sun, a PhD candidate at the Centre

The Australian National University

PhaBOX: A web server for identifying and characterizing phage contigs in metagenomic data

Author: Liao Herui
Peng Cheng
Shang Jiayu
Sun Yanni
Tang Xubo
Publication venue
Publication date: 27/03/2023
Field of study

Motivation: There is accumulating evidence showing the important roles of bacteriophages (phages) in regulating the structure and functions of microbiome. However, lacking an easy-to-use and integrated phage analysis software hampers microbiome-related research from incorporating phages in the analysis. Results: In this work, we developed a web server, PhaBOX, to comprehensively identify and analyze phage contigs in metagenomic data. To our best knowledge, this is the first web server that supports integrated phage analysis, including detecting phage contigs from the metagenomic assembly, lifestyle prediction, taxonomic classification, and host prediction. Instead of treating the algorithms as a black box, PhaBOX also supports visualization of the essential features for making predictions. With the user-friendly graphical interface, users with or without informatics training can easily use the web server for analyzing phages in microbiome data. Availability: The web server of PhaBOX is available via: https://phage.ee.cityu.edu.hk. The source code of PhaBOX is available via: https://github.com/KennthShang/PhaBOXComment: 5 pages, 1 figur

arXiv.org e-Print Archive

Distinct composition and amplification dynamics of transposable elements in sacred lotus (Nelumbo nucifera Gaertn.)

Author: Cerbin Stefan
Jiang Ning
Li Yang
Ou Shujun
Sun Yanni
Publication venue: 'Wiley'
Publication date: 12/08/2022
Field of study

Sacred lotus (Nelumbo nucifera Gaertn.) is a basal eudicot plant with a unique lifestyle, physiological features, and evolutionary characteristics. Here we report the unique profile of transposable elements (TEs) in the genome, using a manually curated repeat library. TEs account for 59% of the genome, and hAT (Ac/Ds) elements alone represent 8%, more than in any other known plant genome. About 18% of the lotus genome is comprised of Copia LTR retrotransposons, and over 25% of them are associated with non-canonical termini (non-TGCA). Such high abundance of non-canonical LTR retrotransposons has not been reported for any other organism. TEs are very abundant in genic regions, with retrotransposons enriched in introns and DNA transposons primarily in flanking regions of genes. The recent insertion of TEs in introns has led to significant intron size expansion, with a total of 200 Mb in the 28 455 genes. This is accompanied by declining TE activity in intergenic regions, suggesting distinct control efficacy of TE amplification in different genomic compartments. Despite the prevalence of TEs in genic regions, some genes are associated with fewer TEs, such as those involved in fruit ripening and stress responses. Other genes are enriched with TEs, and genes in epigenetic pathways are the most associated with TEs in introns, indicating a dynamic interaction between TEs and the host surveillance machinery. The dramatic differential abundance of TEs with genes involved in different biological processes as well as the variation of target preference of different TEs suggests the composition and activity of TEs influence the path of evolution

KU ScholarWorks

PubMed Central

Research on One Novel Logging Interpretation Method of CBM Reservoir

Author: FENG Qing
LI Shengsheng
LI Xiaonan
SUN Yanni
WANG Xuxing
ZENG Ming
Publication venue: Canadian Research & Development Center of Sciences and Cultures
Publication date: 26/12/2020
Field of study

Coalbed methane (CBM) is a kind of natural gas which is stored in the micropores and fractures of the “coal seam” and has not been transported out of the source rock. Conventional logging technology plays an important role in coalbed methane exploration and development. By analyzing the response characteristics of conventional logging of coalbed methane, coal bearing strata are accurately determined. Two methods of statistical model and volume model are established to analyze and calculate industrial components. Based on the study of adsorption isotherm and correlation between logging parameters and coal core gas content, the calculation method of coal seam gas content is determined In practices, the calculation accuracy of industrial components and gas content of coal seam has been significantly improved. Abstract: coalbed methane (CBM) is a kind of natural gas which is stored in the micropores and fractures of “coal seam” and has not been transported out of the source rock. Conventional logging technology plays an important role in coalbed methane exploration and development. By analyzing the response characteristics of conventional logging of coalbed methane, coal bearing strata are accurately determined. Two methods of statistical model and volume model are established to analyze and calculate industrial components. Based on the study of adsorption isotherm and correlation between logging parameters and coal core gas content, the calculation method of coal seam gas content is determined In practice, the calculation accuracy of industrial components and gas content of coal seam has been significantly improved

CSCanada.net: E-Journals (Canadian Academy of Oriental and Occidental Culture, Canadian Research & Development Center of Sciences and Cultures)

2-(2-Iodophenyl)-1,2,3,4-tetrahydroisoquinoline-1-carbonitrile

Author: Abe
Feng Zheng
Ishii
Kamal
Lane
Le Zhou
Liu
Sheldrick
Storch
Wenwen Sun
Wright
Yanni Ma
Yifang Sun
Publication venue: International Union of Crystallography
Publication date: 01/06/2011
Field of study

In the title compound, C16H13IN2, the two benzene rings make a dihedral angle of 67.26 (5)°. The six-membered heterocycle of the tetrahydroisoquinoline unit adopts a half-chair conformation. In the crystal, adjacent molecules are linked by pairs of weak intermolecular C—H⋯N hydrogen bonds, forming inversion dimers. An intramolecular C—H⋯I close contact is also observed

Crossref

Directory of Open Access Journals

PubMed Central